Exploitation of Morphological Structures in Large Vocabulary Arabic Speech Recognition
نویسندگان
چکیده
This paper presents a new approach for large vocabulary Arabic speech recognition based on exploiting the morphological structures of the Arabic language. In this model, word discrimination is achieved by a hybrid analysis scheme, where vowels are described in detail while consonants are classifi ed according to broad phonetic classes. Different phonetic classifi cation strategies are used to describe two large vocabulary lexicons. The results show that about 83% of the 10,000 test Arabic words can be uniquely represented by using 7 broad phonetic classes for consonants and six classes for vowels. In this case, the maximum number of words having the same phonetic labelling is 6. This paper summarises the results of ten different phonetic classifi cation schemes and discusses their implication for a large vocabulary speech recognition system.
منابع مشابه
Morphology-based language modeling for conversational Arabic speech recognition
Language modeling for large-vocabulary conversational Arabic speech recognition is faced with the problem of the complex morphology of Arabic, which increases the perplexity and out-of-vocabulary rate. This problem is compounded by the enormous dialectal variability and differences between spoken and written language. In this paper we investigate improvements in Arabic language modeling by deve...
متن کاملSpoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملDevelopment of a conversational telephone speech recognizer for Levantine Arabic
Many languages, including Arabic, are characterized by a wide variety of different dialects that often differ strongly from each other. When developing speech technology for dialect-rich languages, the portability and reusability of data, algorithms, and system components becomes extremely important. In this paper, we describe the development of a large-vocabulary speech recognition system for ...
متن کاملArabic Phonetic Dictionaries for Speech Recognition
Phonetic dictionaries are essential components of large-vocabulary speaker-independent speech recognition systems. This paper presents a rule-based technique to generate phonetic dictionaries for a large vocabulary Arabic speech recognition system. The system used conventional Arabic pronunciation rules, common pronunciation rules of Modern Standard Arabic, as well as some common dialectal case...
متن کاملInvestigating the use of morphological decomposition and diacritization for improving Arabic LVCSR
One of the challenges related to large vocabulary Arabic speech recognition is the rich morphology nature of Arabic language which leads to both high out-of-vocabulary (OOV) rates and high language model (LM) perplexities. Another challenge is the absence of the short vowels (diacritics) from the Arabic written transcripts which causes a large difference between spoken and written language and ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Int. J. Comput. Proc. Oriental Lang.
دوره 18 شماره
صفحات -
تاریخ انتشار 2005